Encoder-Decoder Based Attractors for End-to-End Neural Diarization

نویسندگان

چکیده

This paper investigates an end-to-end neural diarization (EEND) method for unknown number of speakers. In contrast to the conventional cascaded approach speaker diarization, EEND methods are better in terms overlap handling. However, still has a disadvantage that it cannot deal with flexible To remedy this problem, we introduce encoder-decoder-based attractor calculation module (EDA) EEND. Once frame-wise embeddings obtained, EDA sequentially generates speaker-wise attractors on basis sequence-to-sequence using LSTM encoder-decoder. The generation continues until stopping condition is satisfied; thus, can be flexible. Diarization results then estimated as dot products and embeddings. from overlaps result larger product values multiple attractors; overlaps. Because maximum output speakers limited by training set, also propose iterative inference remove restriction. Further, aligns external speech activity detector, which enables fair comparison against approaches. Extensive evaluations simulated real datasets show EEND-EDA outperforms approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Handwriting Trajectory Recovery using End-to-End Deep Encoder-Decoder Network

In this paper, we introduce a novel technique to recover the pen trajectory of offline characters which is a crucial step for handwritten character recognition. Generally, online acquisition approach has more advantage than its offline counterpart as the online technique keeps track of the pen movement. Hence, pen tip trajectory retrieval from offline text can bridge the gap between online and ...

متن کامل

End-to-End Trainable Attentive Decoder for Hierarchical Entity Classification

We address fine-grained entity classification and propose a novel attention-based recurrent neural network (RNN) encoderdecoder that generates paths in the type hierarchy and can be trained end-to-end. We show that our model performs better on fine-grained entity classification than prior work that relies on flat or local classifiers that do not directly model hierarchical structure.

متن کامل

End-to-end esophagojejunostomy versus standard end-to-side esophagojejunostomy: which one is preferable?

Abstract Background: End-to-side esophagojejunostomy has almost always been associated with some degree of dysphagia. To overcome this complication we decided to perform an end-to-end anastomosis and compare it with end-to-side Roux-en-Y esophagojejunostomy. Methods: In this prospective study, between 1998 and 2005, 71 patients with a diagnosis of gastric adenocarcinoma underwent total gastrec...

متن کامل

JEJUNAL EVERSION MUCOSECTOMY AND INVAGINATION: AN INNOVATIVE TECHNIQUE FOR THE END TO END PANCREATICOJEJUNOSTOMY

ABSTRACT Background: The pancreatojejunostomy has notoriously been known to carry a high rate of operative complications, morbidity and mortality, mainly due to anastomotic leak and ensuing septic complications. Objective: In order to decrease anastomotic leak and its attendant morbidity and mortality in operations requiring a pancreato-jejunal anastomosis, and also in order to simplify the op...

متن کامل

End-to-End Neural Speech Synthesis

In recent years, end-to-end neural networks have become the state of the art for speech recognition tasks and they are now widely deployed in industry (Amodei et al., 2016). Naturally, this has led to the creation of systems to do the opposite – end-to-end speech synthesis from raw text. Very recently, neural TTS systems have become highly competitive with their conventional counterparts, showi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2022

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2022.3162080